pacman::p_load(tidyverse, sf, tmap, httr)Take-home Exercise 1: Geospatial Analytics for Public Good
1 Overview

Inside a Hawker Center (via TripAdvisor)
The Food & Beverage (F&B) and Hospitality sector is a vital pillar of Singapore’s economy, contributing significantly to both employment and tourism-driven revenue. In 2024, international visitor arrivals reached about 16.5 million, and tourism receipts rose to S$29,781 million (or ~SGD 29.8 billion), a strong recovery to pre-pandemic levels. Among the components of tourism spend, expenditure on Food & Beverage increased by 6% year-on-year in January - September 2024. The F&B services sector also plays a major role domestically: for example, in July 2025, its sales reached S$1.0 billion, with fast food outlets and food caterers showing notable growth, and approximately a quarter of F&B sales happening online.
Beyond raw numbers, the sector deeply shapes everyday life—socializing, culture, lifestyle—through restaurants, hawker centers, cafés, bars, catering, etc. It’s also a key part of the hospitality value chain: hotels not only provide lodging but often generate substantial revenue from associated F&B (restaurants, banquets). The dynamic interplay between F&B manufacturers, drink producers, and service-based outlets influences urban planning (where clusters of eating and dining pop up), business location decisions, land-use, and labour markets.
2 Getting Started
2.1 The Packages
In this exercise, we will use following packages:
| Package | Description |
|---|---|
| sf | Provides functions to manage, processing, and manipulate Simple Features, a formal geospatial data standard that specifies a storage and access model of spatial geometries such as points, lines, and polygons. |
| spatstat | Provides functions for spatial statistics with a strong focus on analysing spatial point patterns. |
| raster | Provides functions which reads, writes, manipulates, analyses and model of gridded spatial data (i.e. raster). In this hands-on exercise, it will be used to convert image output generate by spatstat into raster format. |
| tidyverse | Provides collection of functions for performing data science task such as importing, tidying, wrangling data and visualising data. |
| tmap | Provides functions for plotting cartographic quality static point patterns maps or interactive maps by using leaflet API |
| httr | Provide a wrapper for the curl package, customised to the demands of modern web APIs. Key features: Functions for the most important http verbs: GET(), HEAD(), PATCH(), PUT(), DELETE() and POST(). |
| dplyr | Provides a set of verbs that help solving the most common data manipulation challenges. |
| lubridate | Handle date and date-time data type. |
2.2 The Data
2.2.1 Business Categories Overview
Using the Singapore Standard Industrial Classification (SSIC) 2020, three business categories were selected for analysis: Food and Beverage Service Activities (Division 56), Manufacture of Other Food Products (Group 107), and Manufacture of Beverages (Group 110). Together, they span the spectrum from food production and processing to retail consumption, providing a comprehensive view of how food-related businesses emerge and evolve across the city. Examining their spatial and temporal patterns allows us to understand not only where F&B establishments are located, but also how they interact with upstream manufacturers and beverage producers. This holistic perspective offers valuable insights into the clustering of economic activities, the shaping of commercial districts, and the role of the F&B ecosystem in supporting urban development and community life.
Below is the details of the business type selected:
56 – Food and Beverage Service Activities
- Division 56: Food and beverage service activities
Includes: Restaurants, cafés, fast food outlets, food courts, coffee shops, hawker centers, caterers, pubs, bars, nightclubs, and other drinking places.
Covers businesses primarily engaged in preparing meals, snacks, and drinks for immediate consumption.
107 – Manufacture of Other Food Products
Division 10: Manufacture of food products
Group 107: Manufacture of other food products
Includes:
1071 – Manufacture of bakery products (bread, biscuits, cakes, confectionery)
1074 – Manufacture of macaroni, noodles, vermicelli and related products
1075 – Manufacture of prepared meals and dishes (e.g., frozen dinners)
1076 – Manufacture of coffee, tea and related products
1079 – Manufacture of other food products n.e.c. (sauces, soya products, spices, chips, crackers, titbits)
110 – Manufacture of Beverages
Division 11: Manufacture of beverages
Group 110: Manufacture of beverages
Includes:
1101 – Distilling, rectifying and blending of spirits
1102 – Manufacture of wines
1103 – Manufacture of malt liquors and malt (beer, stout)
1104 – Manufacture of soft drinks, bottled/mineral waters, cordials, syrups, composite concentrates, and ice.
2.2.2 Datasets
| Dataset Name | Description | Format | Source |
|---|---|---|---|
| ACRA (Accounting and Corporate Regulatory Authority) Information on Corporate Entities | Datasets of registered entities in Singapore. (We will select business entities registered between 1st January 2024 to 30th June 2025.) | CSV | data.gov.sg |
| SSIC: Singapore Standard Industrial Classification (SSIC) | Official system for classifying economic activity of business establishments in Singapore. | XLSX | SSIC |
| Singapore Master Plan 2019 (URA) Zoning Data | Singapore’s land-use zones: residential, commercial, mixed-use, industrial. | GEOJSON | data.gov.sg |
| LTA MRT Station Exit | Geospatial data for Singapore’s MRT/LRT stations exit. | GEOJSON | data.gov.sg |
2.3 Data Importing & Wrangling
2.3.1 Importing ACRA data
The ACRA Corporate Entities CSV files were imported into R using tidyverse. The first 24 columns were retained, covering entity identifiers, names, incorporation dates, SSIC codes, and addresses. This ensured consistency across all files as we are dropping the unnecessary fields such as fields with excessive missing values or fields with only single values.
folder_path <- "data/aspatial/ACRA"
file_list <- list.files(path = folder_path,
pattern = "^ACRA*.*\\.csv$",
full.names = TRUE)
acra_data <- file_list %>%
map_dfr(read_csv)glimpse(acra_data)Rows: 2,026,935
Columns: 53
$ uen <chr> "00022100K", "00031800X", "00043100A…
$ issuance_agency_id <chr> "ACRA", "ACRA", "ACRA", "ACRA", "ACR…
$ entity_name <chr> "A Y ABDUL RAHIMAN", "A M ABDULLAH S…
$ entity_type_description <chr> "Sole Proprietorship/ Partnership", …
$ business_constitution_description <chr> "Partnership", "Partnership", "Partn…
$ company_type_description <chr> "na", "na", "na", "na", "na", "na", …
$ paf_constitution_description <chr> "na", "na", "na", "na", "na", "na", …
$ entity_status_description <chr> "Terminated", "Terminated", "Termina…
$ registration_incorporation_date <date> 1974-10-09, 1974-10-12, 1974-09-20,…
$ uen_issue_date <date> 1974-10-09, 1974-10-12, 1974-09-20,…
$ address_type <chr> "LOCAL", "LOCAL", "LOCAL", "LOCAL", …
$ block <chr> "51", "93", "178", "21", "30", "38A"…
$ street_name <chr> "EAST COAST ROAD", "MARKET STREET", …
$ level_no <chr> "na", "10", "na", "na", "na", "na", …
$ unit_no <chr> "na", "01", "na", "na", "na", "na", …
$ building_name <chr> "na", "na", "na", "na", "na", "na", …
$ postal_code <chr> "428770", "0104", "0718", "0923", "4…
$ other_address_line1 <chr> "na", "na", "na", "na", "na", "na", …
$ other_address_line2 <chr> "na", "na", "na", "na", "na", "na", …
$ account_due_date <chr> "na", "na", "na", "na", "na", "na", …
$ annual_return_date <chr> "na", "na", "na", "na", "na", "na", …
$ primary_ssic_code <dbl> 47722, 46301, 68104, 56111, 47102, 6…
$ primary_ssic_description <chr> "na", "na", "na", "na", "na", "na", …
$ primary_user_described_activity <chr> "na", "na", "na", "na", "na", "na", …
$ secondary_ssic_code <chr> "na", "32909", "na", "56122", "na", …
$ secondary_ssic_description <chr> "na", "na", "na", "na", "na", "na", …
$ secondary_user_described_activity <chr> "na", "na", "na", "na", "na", "na", …
$ no_of_officers <dbl> 7, 6, 3, 1, 5, 2, 2, 1, 4, 2, 2, 1, …
$ former_entity_name1 <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name2 <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name3 <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name4 <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name5 <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name6 <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name7 <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name8 <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name9 <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name10 <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name11 <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name12 <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name13 <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name14 <chr> "na", "na", "na", "na", "na", "na", …
$ former_entity_name15 <chr> "na", "na", "na", "na", "na", "na", …
$ uen_of_audit_firm1 <chr> "na", "na", "na", "na", "na", "na", …
$ name_of_audit_firm1 <chr> "na", "na", "na", "na", "na", "na", …
$ uen_of_audit_firm2 <chr> "na", "na", "na", "na", "na", "na", …
$ name_of_audit_firm2 <chr> "na", "na", "na", "na", "na", "na", …
$ uen_of_audit_firm3 <chr> "na", "na", "na", "na", "na", "na", …
$ name_of_audit_firm3 <chr> "na", "na", "na", "na", "na", "na", …
$ uen_of_audit_firm4 <chr> "na", "na", "na", "na", "na", "na", …
$ name_of_audit_firm4 <chr> "na", "na", "na", "na", "na", "na", …
$ uen_of_audit_firm5 <chr> "na", "na", "na", "na", "na", "na", …
$ name_of_audit_firm5 <chr> "na", "na", "na", "na", "na", "na", …
The dataset consists of 2,026,935 rows and 53 columns.
write_rds(acra_data,
"data/rds/acra_data.rds")2.3.2 Tidying ACRA data
Standardisation of Key Fields
Column names were renamed to a common schema (date, ssic_raw, postal_code). Dates were converted into proper R Date objects. SSIC codes were cleaned to numeric strings, and 2-digit (ssic_2d) and 3-digit (ssic_3d) prefixes were extracted. Postal codes were also padded to six digits for consistency.
Temporal Filtering
Only businesses registered between 01-Jan-2024 and 30-Jun-2025 were retained, aligning with the study’s spatio-temporal scope. Records with missing or invalid dates were excluded.
Business Type Selection
The dataset was filtered to include only the Food and Beverage / Hospitality categories: Division 56 – Food & Beverage Service Activities, Group 107 – Manufacture of Other Food Products, Group 110 – Manufacture of Beverages
date_min <- as_date("2024-01-01")
date_max <- as_date("2025-06-30")acra_all_fnb <- acra_data %>%
# keep the first 24 cols
select(1:24) %>%
# commonize column names
rename(
date = registration_incorporation_date,
ssic_raw = primary_ssic_code,
postal_code = postal_code
) %>%
mutate(
# dates
date = as_date(date),
# SSIC as character
ssic_raw = as.character(ssic_raw),
ssic_raw = str_replace_all(ssic_raw, "[^0-9]", ""),
ssic_2d = str_sub(ssic_raw, 1, 2),
ssic_3d = str_sub(ssic_raw, 1, 3),
# clean postal code to 6-digit with zeros
postal_code = str_pad(postal_code, width = 6, side = "left", pad = "0")
) %>%
filter(!is.na(date), date >= date_min, date <= date_max) %>%
# target Division 56 OR Group 107 OR Group 110
filter(ssic_2d == "56" | ssic_3d %in% c("107", "110")) %>%
# time fields
mutate(
YEAR = year(date),
MONTH_NUM = month(date),
MONTH_ABBR = month(date, label = TRUE, abbr = TRUE),
fnb_bucket = case_when(
ssic_2d == "56" ~ "F&B Services (Div 56)",
ssic_3d == "107" ~ "Food Manufacturing (Grp 107)",
ssic_3d == "110" ~ "Beverage Manufacturing (Grp 110)",
TRUE ~ "Other"
)
)Sanity Checks & Subsets
Basic counts by fnb_bucket were generated to confirm category distribution. Monthly registration summaries were produced to validate temporal coverage. Separate data frames (biz_56, biz_107, biz_110) were created for more detailed subgroup analysis.
# Counts by bucket
acra_all_fnb %>% count(fnb_bucket, sort = TRUE)# A tibble: 3 × 2
fnb_bucket n
<chr> <int>
1 F&B Services (Div 56) 5758
2 Food Manufacturing (Grp 107) 737
3 Beverage Manufacturing (Grp 110) 30
# Monthly registrations for later plotting
acra_all_fnb %>%
count(YYYYMM = floor_date(date, "month"), fnb_bucket) %>%
arrange(YYYYMM, fnb_bucket)# A tibble: 52 × 3
YYYYMM fnb_bucket n
<date> <chr> <int>
1 2024-01-01 Beverage Manufacturing (Grp 110) 2
2 2024-01-01 F&B Services (Div 56) 337
3 2024-01-01 Food Manufacturing (Grp 107) 51
4 2024-02-01 Beverage Manufacturing (Grp 110) 1
5 2024-02-01 F&B Services (Div 56) 236
6 2024-02-01 Food Manufacturing (Grp 107) 39
7 2024-03-01 Beverage Manufacturing (Grp 110) 1
8 2024-03-01 F&B Services (Div 56) 338
9 2024-03-01 Food Manufacturing (Grp 107) 43
10 2024-04-01 Beverage Manufacturing (Grp 110) 1
# ℹ 42 more rows
# separate data frames for each bucket
biz_56 <- acra_all_fnb %>% filter(ssic_2d == "56")
biz_107 <- acra_all_fnb %>% filter(ssic_3d == "107")
biz_110 <- acra_all_fnb %>% filter(ssic_3d == "110")# Date column named `date` and a month column for plotting later:
acra_all_fnb <- acra_all_fnb %>%
mutate(
date = as.Date(date),
month = floor_date(date, "month"),
postal_code = stringr::str_pad(postal_code, 6, pad = "0")
)
biz_56 <- biz_56 %>% mutate(postal_code = stringr::str_pad(postal_code, 6, pad = "0"))
biz_107 <- biz_107 %>% mutate(postal_code = stringr::str_pad(postal_code, 6, pad = "0"))
biz_110 <- biz_110 %>% mutate(postal_code = stringr::str_pad(postal_code, 6, pad = "0"))2.3.3 Geocoding
Extract Unique Postcodes
Unique six-digit postal codes were extracted from the ACRA dataset. To ensure consistency, codes were padded with leading zeros where necessary and invalid entries (missing or non-numeric) were excluded.
Query the OneMap API
Each postcode was sent to the OneMap Singapore API using the code chunk below. The API returned address details and coordinates (X, Y in SVY21 projection; LATITUDE, LONGITUDE in WGS84). Postcodes with no matches were logged separately for later review.
postcodes <- unique(acra_all_fnb$postal_code)
url <- "https://onemap.gov.sg/api/common/elastic/search"
found <- data.frame()
not_found <- data.frame(postcode = character())
for (pc in postcodes) {
query <- list(
searchVal = pc,
returnGeom = "Y",
getAddrDetails = "Y",
pageNum = "1"
)
res <- GET(url, query = query)
json <- content(res)
if(json$found != 0) {
df <- as.data.frame(json$results, stringsAsFactors = FALSE)
df$input_postcode <- pc
found <- bind_rows(found, df)
} else {
not_found <- bind_rows(not_found, data.frame(postcode = pc))
}
}Append Geocoding Results
Geocoding was performed by extracting unique six-digit postal codes from the ACRA dataset and submitting them to the OneMap Singapore API. The API returned both address details and geographic coordinates, which were then joined back to the business records. Invalid or missing coordinates were cleaned, and duplicates were removed to ensure data quality. The final datasets were converted into spatial features (sf objects) in the SVY21 projection (EPSG:3414), enabling integration with planning boundaries and spatial analysis. To ensure reproducibility and avoid repeated API calls, the geocoded datasets and unresolved cases were saved as .rds files.
found <- found %>%
#X/Y from OneMap are SVY21 (EPSG:3414); coerce to numeric
mutate(
X = suppressWarnings(as.numeric(X)),
Y = suppressWarnings(as.numeric(Y))
) %>%
# keep the essential columns
select(1:10)acra_all_fnb <- acra_all_fnb %>% left_join(found, by = c("postal_code" = "POSTAL"))
biz_56 <- biz_56 %>% left_join(found, by = c("postal_code" = "POSTAL"))
biz_107 <- biz_107 %>% left_join(found, by = c("postal_code" = "POSTAL"))
biz_110 <- biz_110 %>% left_join(found, by = c("postal_code" = "POSTAL"))df_to_sf <- function(df, id_col = NULL) {
df2 <- df %>%
mutate(across(any_of(c("X","Y","LONGITUDE","LATITUDE")),
~ suppressWarnings(as.numeric(.x))))
if (all(c("X","Y") %in% names(df2))) {
df2 <- df2 %>% filter(!is.na(X), !is.na(Y))
if (!is.null(id_col) && id_col %in% names(df2)) {
df2 <- df2 %>% group_by(.data[[id_col]]) %>% slice(1) %>% ungroup()
}
st_as_sf(df2, coords = c("X","Y"), crs = 3414) # SVY21
} else if (all(c("LONGITUDE","LATITUDE") %in% names(df2))) {
df2 <- df2 %>% filter(!is.na(LONGITUDE), !is.na(LATITUDE))
if (!is.null(id_col) && id_col %in% names(df2)) {
df2 <- df2 %>% group_by(.data[[id_col]]) %>% slice(1) %>% ungroup()
}
st_as_sf(df2, coords = c("LONGITUDE","LATITUDE"), crs = 4326)
} else {
stop("No coordinate columns found. Expected X/Y or LONGITUDE/LATITUDE. Present: ",
paste(names(df2), collapse = ", "))
}
}
acra_all_fnb_sf <- df_to_sf(acra_all_fnb, id_col = "uen")
biz_56_sf <- df_to_sf(biz_56, id_col = "uen")
biz_107_sf <- df_to_sf(biz_107, id_col = "uen")
biz_110_sf <- df_to_sf(biz_110, id_col = "uen")write_rds(acra_all_fnb, "data/rds/acra_all_fnb.rds")
write_rds(biz_56, "data/rds/biz_56.rds")
write_rds(biz_107, "data/rds/biz_107.rds")
write_rds(biz_110, "data/rds/biz_110.rds")
write_rds(not_found, "data/rds/postcodes_not_found.rds") biz_missing <- acra_all_fnb %>%
mutate(X = suppressWarnings(as.numeric(X)), Y = suppressWarnings(as.numeric(Y))) %>%
filter(is.na(X) | is.na(Y))
write_rds(biz_missing, "data/rds/biz_missing_coords.rds")Visualising the distribution
# Load URA Subzone boundaries
subzones <- st_read("data/geospatial/MasterPlan2019SubzoneBoundaryNoSeaGEOJSON.geojson") %>%
st_transform(crs = 3414)Reading layer `MasterPlan2019SubzoneBoundaryNoSeaGEOJSON' from data source
`C:\sonphamsmu\isss626-aug2526\Take-home_Ex\Take-home_Ex1\data\geospatial\MasterPlan2019SubzoneBoundaryNoSeaGEOJSON.geojson'
using driver `GeoJSON'
Simple feature collection with 332 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
# Spatial join: assign each business to a subzone
acra_all_fnb_sf <- st_join(acra_all_fnb_sf, subzones)tmap_mode("view") # interactive map
tm_shape(subzones) + tm_borders() +
tm_shape(acra_all_fnb_sf) + tm_dots(col = "fnb_bucket", size = 0.05)